Smaller Coresets for k-Median and k-Means Clustering

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coresets for k-Means and k-Median Clustering and their Applications

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in IR, one can compute a weighted set S ⊆ P , of size O(kε−d log n), such that one can compute the k-median/means clustering on S instead of on P , and get an (1 + ε)-approximation. As a result, we impr...

متن کامل

BICO: BIRCH Meets Coresets for k-Means Clustering

We design a data stream algorithm for the k-means problem, called BICO, that combines the data structure of the SIGMOD Test of Time award winning algorithm BIRCH [27] with the theoretical concept of coresets for clustering problems. The k-means problem asks for a set C of k centers minimizing the sum of the squared distances from every point in a set P to its nearest center in C. In a data stre...

متن کامل

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Coresets and streaming algorithms for the k-means problem and related clustering objectives

The continuing technological advances in different areas represent a challenge for researchers in computer science and in particular in the area of algorithms and theory. The gap between processing speed and data volume increases constantly, even though the performance of computers and their central processing units increases at a fast rate. This is because the data that surrounds us multiplies...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Discrete & Computational Geometry

سال: 2006

ISSN: 0179-5376,1432-0444

DOI: 10.1007/s00454-006-1271-x